170 research outputs found

    The PAPILLON project: cooperatively building a multilingual lexical data-base to derive open source dictionaries and lexicons

    No full text
    International audienceThe PAPILLON project aims at creating a cooperative, free, permanent, web-oriented and personalizable environment for the development and the consultation of a multilingual lexical database. The initial motivation is the lack of dictionaries, both for humans and machines, between French and many Asian languages. In particular, although there are large F-J paper usage dictionaries, they are usable only by Japanese literates, as they never contain both original (kanji/kana) and romaji writing. This applies as well to Thai, Vietnamese, Lao, etc

    BEYTrans: A Free Online Collaborative Wiki-Based CAT Environment Designed for Online Translation Communities

    Get PDF
    PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

    Learning-to-Translate Based on the S-SSTC Annotation Schema

    Get PDF

    A Framework for Data Management for the Online Volunteer Translators\u27 Aid System QRLex

    Get PDF
    PACLIC 19 / Taipei, taiwan / December 1-3, 200

    Towards Higher Quality Internal and Outside Multilingualization of Web Sites

    No full text
    International audienceThe multilingualization of Web sites with high quality is increasingly important, but is unsolvable in most situations where internal quality certification is needed, and not solved in the majority of other situations. We demonstrate it by analyzing a variety of techniques to make the underlying software easily localizable and to manage the translation of textual content in the classical internal mode, that is by modifying the language-dependent resources. A new idea is that volunteer final users should be able to contribute to the improvement oreven production of translated resources and content. For this, we have developed a PHP piece of code which naive webmasters (not computer scientists nor professional translators) can add to a Web site to enable internal multilingualization by users with enough access rights: in management mode, these users can edit the texts of titles, button labels, messages, etc. in text areas appearing in context in the Web page. If Web site developers follow some recommendations, all textual interface elements should be localizable in this way. Another angle of attack, applicable in all cases where navigating a site though a gateway is possible, consists in replacing the problem of diffusion by the problem of access in multiple lang uages. We introduce the concept of iMAG (interactive Multilingual Access Gateway, dedicated to a Web site or domain) to solve the problem of higher quality multilingual access. First, by using available MT systems or by default morphological processors and bilingual dictionaries, any page of an elected website is made instantly accessible in many languages, with a generally low quality profile, as through usual translation gateways. Over time, the quality profile of textual GUI elements, Web pages and even documents (if accessible in html) will improve thanks to outside contributors, who will post-edit or produce the translations from the reading context. This is only possible because the iMAG associated to the website stores the translations in its translation memory (TM) and the contributed dictionary items it its dictionary. The TM has quality levels, according to the users' profiles, and scores within levels. An API will be proposed so that the developers of the elected website can connect their to its iMAG, retrieve the best level translations, certify them if necessary, and put them in their localized resources. At that point, external localization meets internal localization

    Operationalization of interactive Multilingual Access Gateways (iMAGs) in the Traouiero project

    No full text
    International audienceWe will explain and demonstrate iMAGs (interactive Multilingual Access Gateways), in particular on a scientific laboratory web site and on the Greater Grenoble (La Métro) web site. This bilingual presentation has been obtained using an iMAG. Presentation This presentation is an adaptation and update of an article presented as a demonstration only to TALN-2010. The names of the files have been kept the same, although their contents are slightly different. The iMAG concept has been proposed by Ch. Boitet and V. Bellynck in 2006 (Boitet & al. 2008, Boitet & al. 2005), and reached prototype status in November 2008, with a first demonstration on the LIG laboratory Web site. It has been adapted to the DSR (Digital Silk Road) Web site in April 2009, and then to more than 50 other Web sites. These first prototypes are extensions of the SECTra_w (Huynh & al. 2008) online translation corpora support system. Since the beginning of 2011, we are operationalizing this software with a view to deploy it as a multilingual access infrastructure, in the context of the French ANR (National Agency for Research) Traouiero " emergence " project. An iMAG is an interactive Multilingual Access Gateway very much like Google Translate at first sight: one gives it a URL (starting Web site) and an access language and then navigates in that access language. When the cursor hovers over a segment (usually a sentence or a title), a palette shows the source segment and proposes to contribute by correcting the target segment, in effect post-editing an MT result. With Google Translate, the page does not change after contribution, and if another page contains the same segment, its translation is still the rough MT result, not the polished post-edited version. The more recent Google Translation Toolkit enables one to MT-translate and then post-edit online full Web pages from sites such as Wikipedia, but again the corrected segments don't appear when one later browses the Wikipedia page in the access language. By contrast, an iMAG is dedicated to an elected Web site, or rather to the elected sublanguage defined by one or more URLs and their textual content. It contains a translation memory (TM) and a specific, preterminological dictionary (pTD), both dedicated to the elected sublanguage. Segments are pretranslated not by a unique MT system, but by a (selectable) set of MT systems. Systran and Google are mainly used now, but specialized systems developed from the postedited part of the TM, and based on Moses, will be also used in the future. The powerful online contributive platforms SECTra_w and PIVAX (Nguyen & al. 2007) are used to support the TMs and pTDs. Translated pages are built with the best segment translations available so far. While reading a translated page, it is possible not only to contribute to the segment under the cursor, but also to seamlessly switch to SECTra_w online post-editing environment, equipped with proactive dictionary help and good filtering and search-and-replace functions, and then back to the reading context. A translation relay is being implemented to define the iMAGs or other translation gateways used by an elected Web site, select and parameterize the MT systems and translation routes used for various language pairs, and manage users, groups, projects (some contributions may be organized, other opportunistic), and access rights. Finally, MT systems tailored to the selected sublanguage can be built (by combinations of empirical and expert methods) from the TM and the pTD dedicated to a given elected Web site. That approach will inherently raise the linguistic and terminological quality of the MT results, hopefully converting them from rough into raw translations. The demonstration will use some iMAGs created by the AXiMAG startup for various Web sites, such as those of the LIG lab (http://service.aximag.fr:8180/xwiki/bin/view/imag/liglab) and of La Metro (Greater Grenoble) web site (http://service.aximag.fr:8180/xwiki/bin/view/imag/lametro), where access in Chinese and English was enabled in 2010 for the Shanghai Expo

    Multilinguisation d'ontologies dans le cadre de la recherche d'information translingue dans des collections d'images accompagnées de textes spontanés

    Get PDF
    Le Web est une source proliférante d'objets multimédia, décrits dans différentes langues natu- relles. Afin d'utiliser les techniques du Web sémantique pour la recherche de tels objets (images, vidéos, etc.), nous proposons une méthode d'extraction de contenu dans des collections de textes multilingues, paramétrée par une ou plusieurs ontologies. Le processus d'extraction est utilisé pour indexer les objets multimédia à partir de leur contenu textuel, ainsi que pour construire des requêtes formelles à partir d'énoncés spontanés. Il est basé sur une annotation interlingue des textes, conservant les ambiguïtés de segmentation et la polysémie dans des graphes. Cette première étape permet l'utilisation de processus de désambiguïsation factorisés au niveau d'un lexique pivot (de lexèmes interlingues). Le passage d'une ontologie en paramètre du système se fait en l'alignant de façon automatique avec le lexique interlingue. Il est ainsi possible d'utiliser des ontologies qui n'ont pas été conçues pour une utilisation multilingue, et aussi d'ajouter ou d'étendre l'ensemble des langues et leurs couvertures lexicales sans modifier les ontologies. Un démonstrateur pour la recherche multilingue d'images, développé pour le projet ANR OMNIA, a permis de concrétiser les approches proposées. Le passage à l'échelle et la qualité des annotations produites ont ainsi pu être évalués.The World Wide Web is a proliferating source of multimedia objects described using various natural languages. In order to use semantic Web techniques for retrieval of such objects (images, videos, etc.), we propose a content extraction method in multilingual text collections, using one or several ontologies as parameters. The content extraction process is used on the one hand to index multimedia objects using their textual content, and on the other to build formal requests from spontaneous user requests. The process is based on an interlingual annotation of texts, keeping ambiguities (polysemy and segmentation) in graphs. This first step allows using common desambiguation processes at th elevel of a pivot langage (interlingual lexemes). Passing an ontology as a parameter of the system is done by aligning automatically its elements with the interlingual lexemes of the pivot language. It is thus possible to use ontologies that have not been built for a specific use in a multilingual context, and to extend the set of languages and their lexical coverages without modifying the ontologies. A demonstration software for multilingual image retrieval has been built with the proposed approach in the framework of the OMNIA ANR project, allowing to implement the proposed approaches. It has thus been possible to evaluate the scalability and quality of annotations produiced during the retrieval process.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
    corecore